Lithium NLP: A System for Rich Information Extraction from Noisy User Generated Text on Social Media
نویسندگان
چکیده
In this paper, we describe the Lithium Natural Language Processing (NLP) system a resource-constrained, highthroughput and language-agnostic system for information extraction from noisy user generated text on social media. Lithium NLP extracts a rich set of information including entities, topics, hashtags and sentiment from text. We discuss several real world applications of the system currently incorporated in Lithium products. We also compare our system with existing commercial and academic NLP systems in terms of performance, information extracted and languages supported. We show that Lithium NLP is at par with and in some cases, outperforms stateof-the-art commercial NLP systems.
منابع مشابه
The French Social Media Bank: a Treebank of Noisy User Generated Content
In recent years, statistical parsers have reached high performance levels on well-edited texts. Domain adaptation techniques have improved parsing results on text genres differing from the journalistic data most parsers are trained on. However, such corpora usually comply with standard linguistic, spelling and typographic conventions. In the meantime, the emergence of Web 2.0 communication medi...
متن کاملA Hybrid Approach for Entity Extraction in Code-Mixed Social Media Data
Entity extraction is one of the important tasks in various natural language processing (NLP) application areas. There has been a significant amount of works related to entity extraction, but mostly for a few languages (such as English, some European languages and few Asian languages) and doamins such as newswire. Nowadays social media have become a convenient and powerful way to express one’s o...
متن کاملInformation Extraction for Social Media
The rapid growth in IT in the last two decades has led to a growth in the amount of information available online. A new style for sharing information is social media. Social media is a continuously instantly updated source of information. In this position paper, we propose a framework for Information Extraction (IE) from unstructured user generated contents on social media. The framework propos...
متن کاملA Lexicon Based Algorithm for Noisy Text Normalization as Pre-processing for Sentiment Analysis
Sentiment analysis in the most general sense refers to the classification of a piece of text into either of the three classes–positive, negative or neutral–according to its polarity. The text may be an entire document, a paragraph, a sentence, a phrase or even a single word. Most of the literature on sentiment analysis is dedicated to well-formed text as found in the newspapers, journals and ma...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کامل